Model Development using PySpark in Jupyter Notebook

Navigate to the ‘Notebooks’ tab of your project and open ‘Create and Save Model’ notebook using Jupyter with latest/highest version of Spark environment. You can select the environment by clicking the 3 vertical dots at the right of the name of the Notebook.

Follow the notebook instructions and execute all cells as directed.

Please note following -

  1. You can read the data either from a Database/DataWarehouse or Virtual Datasource or Flat File depending on your previous step for Data Ingestion and Data Organization. Accordingly you run appropriate cells in the notebook

  2. While Saving the Model using Model name with a prefix of your name/user-id. This is just to ensure that you are not saving a model with a name same as others used.

  3. Whereever, user credentials are needed please use your user if and password

  4. Wherever, urls are needed use the Cloud Pak for Data URL (ip/host name and port) provided to you.